Rows: 16,800,000
Columns: 16
$ job_num <int> 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 10…
$ method <fct> no_covs, all_covs, p_hacked, r, partial_r, full_lm, lass…
$ simulation_id <int> 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3,…
$ estimate <dbl> 0.2158, 0.1872, 0.2218, 0.1812, 0.1812, 0.1812, 0.1872, …
$ SE <dbl> 0.157, 0.151, 0.157, 0.151, 0.151, 0.151, 0.151, 0.156, …
$ p_value <dbl> 0.171, 0.218, 0.159, 0.234, 0.234, 0.234, 0.217, 0.515, …
$ ndf <int> 1, 5, 3, 2, 2, 2, 4, 1, 5, 4, 2, 2, 2, 3, 1, 5, 1, 2, 2,…
$ ddf <int> 148, 144, 146, 147, 147, 147, 145, 148, 144, 145, 147, 1…
$ covs_tpr <dbl> 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1,…
$ covs_fpr <dbl> 0.000, 1.000, 0.667, 0.000, 0.000, 0.000, 0.667, 0.000, …
$ n_obs <int> 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 1…
$ b_x <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ n_covs <int> 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,…
$ r_ycov <dbl> 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0…
$ p_good_covs <dbl> 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.…
$ r_cov <dbl> 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0…
Analysis of Covariate Results
Background
We ran batches of simulations to generate data, fit linear models, and extract the results.
In each simulation, we generated data with a dichotomous \(X\). We ran three batches, each of which had a population parameter for \(X\) set to 0, 0.3, or 0.5.
We manipulated the following variables:
| Variable | Description | Values |
|---|---|---|
| n_obs | Number of observations in a sample | 100, 150, 200, 300, 400 |
| n_covs | Number of total available covariates | 4, 8, 12, 16 |
| p_good_covs | Proportion of “good” covariates* | 0.25, 0.50, 0.75 |
| r_ycov | Correlation between \(Y\) and covariates | 0.3, 0.5 |
| r_cov | Correlation between the “good” covariates* | 0.3 |
- Note: here we define “good” covariates as ones that have a nonzero relationship with \(Y\)
We fully crossed all levels, yielding 120 unique research settings.
We used the following 7 methods to select covariates to include in a linear model:
- No covariates
- All covariates
- P-hacking
- R
- Partial R
- Full lm
- LASSO
We fit a linear model for each method and from the model output, extracted the estimate for \(X\), standard error of this estimate, and p-value of this \(X\) effect.
We repeated this 20,000 times for each research setting.
We present the results here.
Data Analysis
Glimpse data
There is one dataset for \(b_x = 0, 0.3, 0.5\). Each dataset has 16,800,000 observations = 120 unique settings \(\times\) 20,000 simulations each \(\times\) 7 methods
Data for \(b_x = 0\), as an example, is shown below.
Zero \(X\) Effect
First, we look at the zero \(X\) effect condition to compare the Type I errors across methods and research settings.
Type I error
We will look at the overall Type I error across methods, then will compare the error of each method across each of the manipulated variables: n_obs, n_covs, p_good_covs, r_ycov, and r_cov.
by method
We will first consider the Type I error by the selection method. Here we calculate the proportion of significant effects (\(p < 0.05\)), the Type I error, displayed below as a bar plot.
From this plot, we can see that the p-hacking method leads to inflated Type I error rates. The no covariates, all covariates, and r approaches are all at the expected 0.05 mark, while partial r, full lm, and lasso show slight inflation, but are still relatively close.
Here we view the distributions of the Type I error rate by method, beginning by isolating the p-hacked method.
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
We see that while the average Type I error for the p-hacked method was 0.172, it reached as high as 0.408, further emphasizing the inflation of error.
Removing this invalid method, we can view the distributions of the remaining 6 methods.
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
We see that no covariates, all covariates, and r selection methods show normal distributions centered around 0.05, while partial r, full lm, and lasso are slightly right-skewed, with full lm having the greatest skew.
by n_obs
We will view the Type I error rates of each method for the different levels of the number of observations in a sample. In the table below, we see the minimum, maximum, and average Type I error for each level of n_obs by each method.
| method | n_obs | typeI_min | typeI_max | typeI_mean |
|---|---|---|---|---|
| no_covs | 100 | 0.047 | 0.053 | 0.050 |
| no_covs | 150 | 0.046 | 0.052 | 0.049 |
| no_covs | 200 | 0.048 | 0.052 | 0.050 |
| no_covs | 300 | 0.047 | 0.053 | 0.050 |
| no_covs | 400 | 0.048 | 0.052 | 0.050 |
| all_covs | 100 | 0.048 | 0.054 | 0.050 |
| all_covs | 150 | 0.047 | 0.053 | 0.050 |
| all_covs | 200 | 0.048 | 0.053 | 0.050 |
| all_covs | 300 | 0.048 | 0.053 | 0.050 |
| all_covs | 400 | 0.047 | 0.054 | 0.050 |
| p_hacked | 100 | 0.078 | 0.396 | 0.179 |
| p_hacked | 150 | 0.075 | 0.402 | 0.174 |
| p_hacked | 200 | 0.072 | 0.405 | 0.172 |
| p_hacked | 300 | 0.074 | 0.408 | 0.168 |
| p_hacked | 400 | 0.070 | 0.399 | 0.165 |
| r | 100 | 0.048 | 0.053 | 0.050 |
| r | 150 | 0.048 | 0.054 | 0.049 |
| r | 200 | 0.048 | 0.053 | 0.050 |
| r | 300 | 0.048 | 0.054 | 0.050 |
| r | 400 | 0.047 | 0.053 | 0.050 |
| partial_r | 100 | 0.050 | 0.057 | 0.052 |
| partial_r | 150 | 0.048 | 0.056 | 0.051 |
| partial_r | 200 | 0.048 | 0.054 | 0.051 |
| partial_r | 300 | 0.049 | 0.054 | 0.051 |
| partial_r | 400 | 0.047 | 0.053 | 0.051 |
| full_lm | 100 | 0.051 | 0.086 | 0.061 |
| full_lm | 150 | 0.048 | 0.072 | 0.056 |
| full_lm | 200 | 0.049 | 0.067 | 0.055 |
| full_lm | 300 | 0.049 | 0.059 | 0.053 |
| full_lm | 400 | 0.048 | 0.057 | 0.052 |
| lasso | 100 | 0.050 | 0.065 | 0.057 |
| lasso | 150 | 0.049 | 0.063 | 0.054 |
| lasso | 200 | 0.049 | 0.057 | 0.053 |
| lasso | 300 | 0.049 | 0.056 | 0.052 |
| lasso | 400 | 0.048 | 0.056 | 0.051 |
Looking at the average column, we see that the Type I error is not affected much for the no covariates, all covariates, r, and partial r approaches across different sample sizes. However, for p-hacked, full lm, and lasso, the Type I error does decrease as sample size increases. This can be visualized in the plot below.
Again, we see the inflation of Type I error for p-hacking. We also see that lasso and full lm perform worse than the other methods for small sample sizes, but the methods become comparable as sample size increases.
by n_covs
We will view the Type I error rates of each method for the different number of available covariates (Note: this is not necessarily the number of covariates included in the model.). In the table below, we see the minimum, maximum, and average Type I error for each level of n_covs by each method.
| method | n_covs | typeI_min | typeI_max | typeI_mean |
|---|---|---|---|---|
| no_covs | 4 | 0.046 | 0.053 | 0.050 |
| no_covs | 8 | 0.047 | 0.052 | 0.049 |
| no_covs | 12 | 0.047 | 0.052 | 0.050 |
| no_covs | 16 | 0.048 | 0.053 | 0.050 |
| all_covs | 4 | 0.048 | 0.053 | 0.050 |
| all_covs | 8 | 0.047 | 0.054 | 0.050 |
| all_covs | 12 | 0.047 | 0.052 | 0.050 |
| all_covs | 16 | 0.048 | 0.054 | 0.050 |
| p_hacked | 4 | 0.070 | 0.139 | 0.096 |
| p_hacked | 8 | 0.096 | 0.236 | 0.147 |
| p_hacked | 12 | 0.118 | 0.332 | 0.199 |
| p_hacked | 16 | 0.142 | 0.408 | 0.244 |
| r | 4 | 0.048 | 0.054 | 0.050 |
| r | 8 | 0.048 | 0.053 | 0.050 |
| r | 12 | 0.047 | 0.053 | 0.050 |
| r | 16 | 0.048 | 0.054 | 0.050 |
| partial_r | 4 | 0.048 | 0.054 | 0.051 |
| partial_r | 8 | 0.048 | 0.054 | 0.051 |
| partial_r | 12 | 0.047 | 0.055 | 0.051 |
| partial_r | 16 | 0.049 | 0.057 | 0.052 |
| full_lm | 4 | 0.048 | 0.056 | 0.051 |
| full_lm | 8 | 0.049 | 0.061 | 0.054 |
| full_lm | 12 | 0.048 | 0.074 | 0.057 |
| full_lm | 16 | 0.050 | 0.086 | 0.060 |
| lasso | 4 | 0.048 | 0.054 | 0.051 |
| lasso | 8 | 0.049 | 0.056 | 0.053 |
| lasso | 12 | 0.048 | 0.061 | 0.054 |
| lasso | 16 | 0.049 | 0.065 | 0.056 |
Looking at the average column, we see that the Type I error is not affected much for the no covariates, all covariates, r, and partial r approaches across different amounts of available covariates. However, for p-hacked, full lm, and lasso, the Type I error increases as the number of covariates increases. This can be visualized in the plot below.
We see the increase in error for p-hacked, full lm, and lasso methods as the number of covariates increases, while the other methods stay around 0.05.
by p_good_covs
We will view the Type I error rates of each method for the different proportions of “good” covariates. In the table below, we see the minimum, maximum, and average Type I error for each level of p_good_covs by each method.
| method | p_good_covs | typeI_min | typeI_max | typeI_mean |
|---|---|---|---|---|
| no_covs | 0.25 | 0.047 | 0.052 | 0.050 |
| no_covs | 0.50 | 0.046 | 0.053 | 0.050 |
| no_covs | 0.75 | 0.047 | 0.053 | 0.050 |
| all_covs | 0.25 | 0.047 | 0.054 | 0.050 |
| all_covs | 0.50 | 0.048 | 0.054 | 0.050 |
| all_covs | 0.75 | 0.047 | 0.053 | 0.050 |
| p_hacked | 0.25 | 0.070 | 0.260 | 0.135 |
| p_hacked | 0.50 | 0.082 | 0.340 | 0.175 |
| p_hacked | 0.75 | 0.091 | 0.408 | 0.205 |
| r | 0.25 | 0.048 | 0.054 | 0.050 |
| r | 0.50 | 0.048 | 0.053 | 0.050 |
| r | 0.75 | 0.047 | 0.054 | 0.050 |
| partial_r | 0.25 | 0.049 | 0.057 | 0.052 |
| partial_r | 0.50 | 0.048 | 0.055 | 0.051 |
| partial_r | 0.75 | 0.047 | 0.054 | 0.050 |
| full_lm | 0.25 | 0.049 | 0.064 | 0.054 |
| full_lm | 0.50 | 0.048 | 0.076 | 0.055 |
| full_lm | 0.75 | 0.048 | 0.086 | 0.058 |
| lasso | 0.25 | 0.050 | 0.065 | 0.055 |
| lasso | 0.50 | 0.049 | 0.064 | 0.054 |
| lasso | 0.75 | 0.048 | 0.061 | 0.052 |
Looking at the average column, we see that the Type I error is not affected for the no covariates, all covariates, and r approaches across different amounts of proportions. However, for p-hacked, partial r, full lm, and lasso, the Type I error changes as the number of covariates increases. This can be visualized in the plots below.
In this plot, we mainly see the increase in Type I error for the p-hacking approach as the proportion of good covariates increases. However, we cannot see the trends of the other methods clearly, so we will plot this again without the p-hacked line.
In this plot, we see different trends across methods. As there are more good covariates, the Type I error decreases for lasso and partial r, but it increases for full lm.
by correlations
In these batches of simulations, we did not vary the correlation among the good covariates. We will look at the Type I error rates of each method by the correlation between \(Y\) and the good covariates.
| method | r_ycov | typeI_min | typeI_max | typeI_mean |
|---|---|---|---|---|
| no_covs | 0.3 | 0.047 | 0.053 | 0.050 |
| no_covs | 0.5 | 0.046 | 0.053 | 0.050 |
| all_covs | 0.3 | 0.048 | 0.054 | 0.050 |
| all_covs | 0.5 | 0.047 | 0.054 | 0.050 |
| p_hacked | 0.3 | 0.070 | 0.188 | 0.127 |
| p_hacked | 0.5 | 0.081 | 0.408 | 0.217 |
| r | 0.3 | 0.048 | 0.054 | 0.050 |
| r | 0.5 | 0.047 | 0.054 | 0.050 |
| partial_r | 0.3 | 0.049 | 0.057 | 0.051 |
| partial_r | 0.5 | 0.047 | 0.056 | 0.051 |
| full_lm | 0.3 | 0.050 | 0.067 | 0.056 |
| full_lm | 0.5 | 0.048 | 0.086 | 0.055 |
| lasso | 0.3 | 0.049 | 0.065 | 0.054 |
| lasso | 0.5 | 0.048 | 0.063 | 0.053 |
Looking at the average column, we see that the error does not change across correlations for the no covariates, all covariates, r, and partial r approaches. It changes slightly for full lm and lasso. And it changes drastically when p-hacking, such that a higher correlation among good covariates increases the Type I error.
In the bar plot, we see the Type I error for the p-hacking method increase as the correlation between \(Y\) and the good covariates increases. We see the small fluctuations in Type I error for full lm and lasso and the larger increase in error for the p-hacking method.
Estimate, SD, & SE
Here we will compare, across methods, the estimate of \(b_x\), the standard deviation of the estimate, and the average standard error of the estimate. The standard deviation is calculated as the SD of the sampling distribution of the estimates. The standard error is from the linear model output. Since the mean of standard errors would be biased, we calculate the average SE by taking the square root of the mean of the squared standard errors. We compare the differences by subtracting this average linear model SE from the calculated SD.
| method | mean_estimate | SD_estimate | SE_mean | difference |
|---|---|---|---|---|
| no_covs | 0 | 0.148 | 0.148 | 0.000 |
| all_covs | 0 | 0.124 | 0.124 | 0.000 |
| p_hacked | 0 | 0.188 | 0.131 | 0.057 |
| r | 0 | 0.121 | 0.121 | 0.000 |
| partial_r | 0 | 0.122 | 0.121 | 0.001 |
| full_lm | 0 | 0.125 | 0.122 | 0.004 |
| lasso | 0 | 0.123 | 0.121 | 0.002 |
We see that all methods have an average estimate of 0, as expected. We see that the standard deviation of the sampling distribution of the estimate equals the standard error for no covariates, all covariates, and r selection methods There are small differences between these values for partial r, full lm, and lasso methods. The p-hacking shows a large difference.
Sampling Distributions
Here we view a sampling distribution of the estimate for \(b_x\) for each method.
Warning: Removed 33586 rows containing non-finite outside the scale range
(`stat_density()`).
We see again that each method’s distribution is centered around 0, but the p-hacked method is not normally distributed as it is biasing the parameter estimates.
From these primary analyses, we see that the p-hacked method leads to inflated Type I error rates and biased parameter estimates. For the following analyses, we will not include the p-hacked method. While partial r, full lm, and lasso selection methods showed slight inflation of Type I error, we might be willing to accept this for greater reductions in Type II error, which we will compare in the next section.
Nonzero \(X\) Effect
Next, we look at the nonzero \(X\) effect condition to compare the Type II errors across methods and research settings. Recall that we set two values for \(b_x\) of 0.3 and 0.5.
Type II Error
We will look at the overall Type II error across methods (except p-hacked), then will again compare the error of each method across each of the manipulated variables: n_obs, n_covs, p_good_covs, r_ycov, and r_cov.
by method
We will first consider the Type II error by the selection method, for both \(b_x = 0.3\) and \(b_x = 0.5\). Here we calculate the proportion of non-significant effects (\(p \geq 0.05\)), the Type II error, displayed below as bar plots.
b_x = 0.3
In this first plot for \(b_x = 0.3\), we see that the Type II error is highest when no covariates are used in the model. There is a large reduction in Type II error when we include all covariates compared to no covariates, and a slight further reduction in Type II error when we use a selection method for covariates compared to including all.
b_x = 0.5
We see a similar trend here, with \(b_x = 0.5\), of a large reduction in Type II error when including all covariates compared to none, and another small reduction when selecting covariates.
In both cases, we see that the full lm method has the highest Type II error.
by n_obs
We will view the Type II error rates of each method for the different levels of the number of observations in a sample. In the tables below, we see the minimum, maximum, and average Type II error for each level of n_obs by each method, for both \(b_x = 0.3\) and \(b_x = 0.5\).
b_x = 0.3
| method | n_obs | typeII_min | typeII_max | typeII_mean |
|---|---|---|---|---|
| no_covs | 100 | 0.676 | 0.687 | 0.682 |
| no_covs | 150 | 0.547 | 0.562 | 0.554 |
| no_covs | 200 | 0.432 | 0.447 | 0.440 |
| no_covs | 300 | 0.259 | 0.269 | 0.265 |
| no_covs | 400 | 0.146 | 0.156 | 0.151 |
| all_covs | 100 | 0.305 | 0.674 | 0.556 |
| all_covs | 150 | 0.121 | 0.527 | 0.387 |
| all_covs | 200 | 0.044 | 0.405 | 0.266 |
| all_covs | 300 | 0.004 | 0.229 | 0.123 |
| all_covs | 400 | 0.000 | 0.121 | 0.056 |
| r | 100 | 0.287 | 0.668 | 0.539 |
| r | 150 | 0.113 | 0.521 | 0.375 |
| r | 200 | 0.041 | 0.401 | 0.258 |
| r | 300 | 0.004 | 0.228 | 0.120 |
| r | 400 | 0.000 | 0.119 | 0.054 |
| partial_r | 100 | 0.285 | 0.661 | 0.532 |
| partial_r | 150 | 0.112 | 0.516 | 0.372 |
| partial_r | 200 | 0.041 | 0.399 | 0.256 |
| partial_r | 300 | 0.004 | 0.226 | 0.119 |
| partial_r | 400 | 0.000 | 0.119 | 0.054 |
| full_lm | 100 | 0.335 | 0.660 | 0.537 |
| full_lm | 150 | 0.144 | 0.516 | 0.376 |
| full_lm | 200 | 0.051 | 0.399 | 0.259 |
| full_lm | 300 | 0.005 | 0.226 | 0.121 |
| full_lm | 400 | 0.000 | 0.119 | 0.055 |
| lasso | 100 | 0.281 | 0.659 | 0.527 |
| lasso | 150 | 0.113 | 0.517 | 0.370 |
| lasso | 200 | 0.042 | 0.399 | 0.255 |
| lasso | 300 | 0.004 | 0.227 | 0.119 |
| lasso | 400 | 0.000 | 0.118 | 0.054 |
In this table, we see that for all methods, as the sample size increases, the Type II error decreases. This can be better seen in a plot below.
Here, we see the decrease in Type II error for the increase in sample size. We see that the no covariates approach has the highest Type II error. From both the table and the plot, we see that for smaller sample sizes, including all covariates in the model yields higher Type II errors, but these become comparable for larger sample sizes.
b_x = 0.5
| method | n_obs | typeII_min | typeII_max | typeII_mean |
|---|---|---|---|---|
| no_covs | 100 | 0.299 | 0.310 | 0.304 |
| no_covs | 150 | 0.134 | 0.146 | 0.141 |
| no_covs | 200 | 0.057 | 0.063 | 0.060 |
| no_covs | 300 | 0.008 | 0.011 | 0.009 |
| no_covs | 400 | 0.001 | 0.002 | 0.001 |
| all_covs | 100 | 0.017 | 0.292 | 0.173 |
| all_covs | 150 | 0.001 | 0.120 | 0.058 |
| all_covs | 200 | 0.000 | 0.044 | 0.019 |
| all_covs | 300 | 0.000 | 0.006 | 0.002 |
| all_covs | 400 | 0.000 | 0.001 | 0.000 |
| r | 100 | 0.013 | 0.281 | 0.157 |
| r | 150 | 0.000 | 0.117 | 0.053 |
| r | 200 | 0.000 | 0.042 | 0.017 |
| r | 300 | 0.000 | 0.005 | 0.002 |
| r | 400 | 0.000 | 0.000 | 0.000 |
| partial_r | 100 | 0.013 | 0.273 | 0.152 |
| partial_r | 150 | 0.000 | 0.114 | 0.051 |
| partial_r | 200 | 0.000 | 0.041 | 0.017 |
| partial_r | 300 | 0.000 | 0.005 | 0.002 |
| partial_r | 400 | 0.000 | 0.000 | 0.000 |
| full_lm | 100 | 0.035 | 0.273 | 0.162 |
| full_lm | 150 | 0.002 | 0.114 | 0.056 |
| full_lm | 200 | 0.000 | 0.041 | 0.018 |
| full_lm | 300 | 0.000 | 0.005 | 0.002 |
| full_lm | 400 | 0.000 | 0.000 | 0.000 |
| lasso | 100 | 0.014 | 0.274 | 0.152 |
| lasso | 150 | 0.000 | 0.116 | 0.051 |
| lasso | 200 | 0.000 | 0.042 | 0.017 |
| lasso | 300 | 0.000 | 0.006 | 0.002 |
| lasso | 400 | 0.000 | 0.001 | 0.000 |
We see lower overall Type II errors, but the same trend of decreasing errors for increasing sample sizes across all methods. We can view this in the plot below.
Similarly, the no covariates approach has the highest Type II error. For small sample sizes the all covariates approach has higher error which stabilizes with increasing sample size.
by n_covs
We will view the Type II error rates of each method for the different number of available covariates. In the tables below, we see the minimum, maximum, and average Type II error for each level of n_covs by each method, for both \(b_x = 0.3\) and \(b_x = 0.5\).
b_x = 0.3
| method | n_covs | typeII_min | typeII_max | typeII_mean |
|---|---|---|---|---|
| no_covs | 4 | 0.148 | 0.687 | 0.418 |
| no_covs | 8 | 0.149 | 0.683 | 0.419 |
| no_covs | 12 | 0.151 | 0.684 | 0.418 |
| no_covs | 16 | 0.146 | 0.685 | 0.418 |
| all_covs | 4 | 0.016 | 0.670 | 0.319 |
| all_covs | 8 | 0.004 | 0.665 | 0.280 |
| all_covs | 12 | 0.001 | 0.665 | 0.260 |
| all_covs | 16 | 0.000 | 0.674 | 0.251 |
| r | 4 | 0.015 | 0.668 | 0.316 |
| r | 8 | 0.004 | 0.652 | 0.274 |
| r | 12 | 0.001 | 0.642 | 0.250 |
| r | 16 | 0.000 | 0.642 | 0.237 |
| partial_r | 4 | 0.015 | 0.661 | 0.314 |
| partial_r | 8 | 0.004 | 0.640 | 0.271 |
| partial_r | 12 | 0.001 | 0.629 | 0.247 |
| partial_r | 16 | 0.000 | 0.627 | 0.234 |
| full_lm | 4 | 0.016 | 0.660 | 0.314 |
| full_lm | 8 | 0.004 | 0.641 | 0.273 |
| full_lm | 12 | 0.001 | 0.626 | 0.251 |
| full_lm | 16 | 0.000 | 0.629 | 0.241 |
| lasso | 4 | 0.016 | 0.659 | 0.314 |
| lasso | 8 | 0.004 | 0.636 | 0.271 |
| lasso | 12 | 0.001 | 0.619 | 0.245 |
| lasso | 16 | 0.000 | 0.611 | 0.230 |
From the average column, we can see that Type II error decreases as the number of covariates increases across all methods, except the no covariates method as this does not dependent on the number of covariates. We can see these trends in the plot below.
In this plot, we see the decrease in Type II error for increases in number of covariates. As the number of covariates increases, we see larger reductions in Type II error when selecting the covariates compared to including all available covariates. Lasso performs the best with higher numbers of covariates, followe by partial r, r, and full lm.
b_x = 0.5
| method | n_covs | typeII_min | typeII_max | typeII_mean |
|---|---|---|---|---|
| no_covs | 4 | 0.001 | 0.310 | 0.103 |
| no_covs | 8 | 0.001 | 0.305 | 0.103 |
| no_covs | 12 | 0.001 | 0.307 | 0.103 |
| no_covs | 16 | 0.001 | 0.308 | 0.103 |
| all_covs | 4 | 0.000 | 0.286 | 0.059 |
| all_covs | 8 | 0.000 | 0.275 | 0.049 |
| all_covs | 12 | 0.000 | 0.283 | 0.046 |
| all_covs | 16 | 0.000 | 0.292 | 0.046 |
| r | 4 | 0.000 | 0.281 | 0.058 |
| r | 8 | 0.000 | 0.257 | 0.045 |
| r | 12 | 0.000 | 0.251 | 0.041 |
| r | 16 | 0.000 | 0.246 | 0.039 |
| partial_r | 4 | 0.000 | 0.273 | 0.057 |
| partial_r | 8 | 0.000 | 0.245 | 0.044 |
| partial_r | 12 | 0.000 | 0.236 | 0.039 |
| partial_r | 16 | 0.000 | 0.232 | 0.037 |
| full_lm | 4 | 0.000 | 0.273 | 0.058 |
| full_lm | 8 | 0.000 | 0.249 | 0.047 |
| full_lm | 12 | 0.000 | 0.246 | 0.044 |
| full_lm | 16 | 0.000 | 0.246 | 0.043 |
| lasso | 4 | 0.000 | 0.274 | 0.058 |
| lasso | 8 | 0.000 | 0.250 | 0.045 |
| lasso | 12 | 0.000 | 0.236 | 0.039 |
| lasso | 16 | 0.000 | 0.226 | 0.036 |
We see the same decrease in Type II errors for increases in number of covariates. We can visualize this in the plot below.
Similarly, we see the selection methods performing better than including all covariates. Again for higher numbers of covariates, we see lasso has the lowest Type II error, followed by partial r, r, and full lm.
by p_good_covs
We will view the Type II error rates of each method for the different proportions of “good” covariates. In the table below, we see the minimum, maximum, and average Type II error for each level of p_good_covs by each method, for both \(b_x = 0.3\) and \(b_x = 0.5\).
b_x = 0.3
| method | p_good_covs | typeII_min | typeII_max | typeII_mean |
|---|---|---|---|---|
| no_covs | 0.25 | 0.149 | 0.686 | 0.418 |
| no_covs | 0.50 | 0.148 | 0.687 | 0.419 |
| no_covs | 0.75 | 0.146 | 0.684 | 0.418 |
| all_covs | 0.25 | 0.010 | 0.674 | 0.317 |
| all_covs | 0.50 | 0.002 | 0.663 | 0.270 |
| all_covs | 0.75 | 0.000 | 0.648 | 0.245 |
| r | 0.25 | 0.009 | 0.668 | 0.306 |
| r | 0.50 | 0.002 | 0.647 | 0.262 |
| r | 0.75 | 0.000 | 0.639 | 0.240 |
| partial_r | 0.25 | 0.009 | 0.661 | 0.301 |
| partial_r | 0.50 | 0.002 | 0.640 | 0.259 |
| partial_r | 0.75 | 0.000 | 0.633 | 0.239 |
| full_lm | 0.25 | 0.009 | 0.660 | 0.301 |
| full_lm | 0.50 | 0.001 | 0.640 | 0.262 |
| full_lm | 0.75 | 0.000 | 0.635 | 0.246 |
| lasso | 0.25 | 0.009 | 0.659 | 0.300 |
| lasso | 0.50 | 0.001 | 0.641 | 0.259 |
| lasso | 0.75 | 0.000 | 0.634 | 0.237 |
From the average column, we see decreases in Type II error rates for increases in the proportion of good covariates across all methods – except no covariates, as this is independent of the proportion of good covariates. We can visualize the trends more clearly in the plot below.
In addition to the decrease in error mentioned above, we see that including all covariates has a higher Type II error than selection the covariates, especially for lower proportions of good covariates. We see that lasso has the lowest Type II error rate, moreso again for lower proportions of good covariates.
b_x = 0.5
| method | p_good_covs | typeII_min | typeII_max | typeII_mean |
|---|---|---|---|---|
| no_covs | 0.25 | 0.001 | 0.310 | 0.103 |
| no_covs | 0.50 | 0.001 | 0.307 | 0.103 |
| no_covs | 0.75 | 0.001 | 0.307 | 0.103 |
| all_covs | 0.25 | 0.000 | 0.292 | 0.062 |
| all_covs | 0.50 | 0.000 | 0.270 | 0.047 |
| all_covs | 0.75 | 0.000 | 0.263 | 0.041 |
| r | 0.25 | 0.000 | 0.281 | 0.055 |
| r | 0.50 | 0.000 | 0.256 | 0.043 |
| r | 0.75 | 0.000 | 0.239 | 0.039 |
| partial_r | 0.25 | 0.000 | 0.273 | 0.053 |
| partial_r | 0.50 | 0.000 | 0.250 | 0.042 |
| partial_r | 0.75 | 0.000 | 0.234 | 0.038 |
| full_lm | 0.25 | 0.000 | 0.273 | 0.055 |
| full_lm | 0.50 | 0.000 | 0.254 | 0.046 |
| full_lm | 0.75 | 0.000 | 0.246 | 0.043 |
| lasso | 0.25 | 0.000 | 0.274 | 0.054 |
| lasso | 0.50 | 0.000 | 0.251 | 0.042 |
| lasso | 0.75 | 0.000 | 0.235 | 0.037 |
From the table, we see decreases in Type II error for higher proportions of good covariates for methods that do include covariates. We can see the details in the plot below.
We again see that including all covariates has a higher Type II error rate than selecting covariates to include, although this method improves for higher proportions of good covariates. Partial r performs best for a lower proportion of good covariates while lasso performs best for a higher proportion.
by n_good_covs
We can look at the interaction between the number of covariates and the proportion of good covariates to calculate the number of good covariates: \(n\_good\_covs = n\_covs * p\_good\_covs\). This represents the number of covariates in the model that have a nonzero relationship with \(Y\).
b_x = 0.3
| method | n_good_covs | typeII_min | typeII_max | typeII_mean |
|---|---|---|---|---|
| no_covs | 1 | 0.149 | 0.686 | 0.418 |
| no_covs | 2 | 0.151 | 0.687 | 0.419 |
| no_covs | 3 | 0.148 | 0.682 | 0.417 |
| no_covs | 4 | 0.149 | 0.685 | 0.419 |
| no_covs | 6 | 0.149 | 0.683 | 0.419 |
| no_covs | 8 | 0.148 | 0.685 | 0.418 |
| no_covs | 9 | 0.151 | 0.684 | 0.419 |
| no_covs | 12 | 0.146 | 0.682 | 0.417 |
| all_covs | 1 | 0.070 | 0.670 | 0.358 |
| all_covs | 2 | 0.033 | 0.665 | 0.319 |
| all_covs | 3 | 0.016 | 0.665 | 0.292 |
| all_covs | 4 | 0.010 | 0.674 | 0.281 |
| all_covs | 6 | 0.003 | 0.649 | 0.249 |
| all_covs | 8 | 0.002 | 0.663 | 0.242 |
| all_covs | 9 | 0.001 | 0.645 | 0.228 |
| all_covs | 12 | 0.000 | 0.648 | 0.222 |
| p_hacked | 1 | 0.054 | 0.596 | 0.305 |
| p_hacked | 2 | 0.019 | 0.558 | 0.240 |
| p_hacked | 3 | 0.007 | 0.538 | 0.197 |
| p_hacked | 4 | 0.003 | 0.485 | 0.161 |
| p_hacked | 6 | 0.001 | 0.472 | 0.144 |
| p_hacked | 8 | 0.001 | 0.395 | 0.115 |
| p_hacked | 9 | 0.000 | 0.431 | 0.124 |
| p_hacked | 12 | 0.000 | 0.400 | 0.109 |
| r | 1 | 0.069 | 0.668 | 0.354 |
| r | 2 | 0.032 | 0.652 | 0.312 |
| r | 3 | 0.015 | 0.642 | 0.284 |
| r | 4 | 0.009 | 0.642 | 0.268 |
| r | 6 | 0.003 | 0.633 | 0.242 |
| r | 8 | 0.002 | 0.638 | 0.229 |
| r | 9 | 0.001 | 0.633 | 0.223 |
| r | 12 | 0.000 | 0.623 | 0.213 |
| partial_r | 1 | 0.068 | 0.661 | 0.351 |
| partial_r | 2 | 0.032 | 0.640 | 0.309 |
| partial_r | 3 | 0.015 | 0.633 | 0.280 |
| partial_r | 4 | 0.009 | 0.627 | 0.263 |
| partial_r | 6 | 0.003 | 0.626 | 0.240 |
| partial_r | 8 | 0.002 | 0.627 | 0.225 |
| partial_r | 9 | 0.001 | 0.629 | 0.222 |
| partial_r | 12 | 0.000 | 0.624 | 0.212 |
| full_lm | 1 | 0.068 | 0.660 | 0.351 |
| full_lm | 2 | 0.032 | 0.641 | 0.309 |
| full_lm | 3 | 0.016 | 0.635 | 0.281 |
| full_lm | 4 | 0.009 | 0.629 | 0.264 |
| full_lm | 6 | 0.003 | 0.626 | 0.244 |
| full_lm | 8 | 0.001 | 0.629 | 0.233 |
| full_lm | 9 | 0.001 | 0.624 | 0.230 |
| full_lm | 12 | 0.000 | 0.617 | 0.226 |
| lasso | 1 | 0.068 | 0.659 | 0.351 |
| lasso | 2 | 0.033 | 0.641 | 0.310 |
| lasso | 3 | 0.016 | 0.634 | 0.280 |
| lasso | 4 | 0.009 | 0.623 | 0.262 |
| lasso | 6 | 0.003 | 0.618 | 0.240 |
| lasso | 8 | 0.001 | 0.610 | 0.222 |
| lasso | 9 | 0.001 | 0.609 | 0.219 |
| lasso | 12 | 0.000 | 0.594 | 0.206 |
In the table, we see decreases in Type II error rates as the number of good covariates increases across methods that include covariates. We can further visualize this in the plot.
In the plot, we see the decreasing trend in Type II error. Including all covariates yields higher Type II error than selecting them. Lasso has the lowest Type II error, especially as the number of good covariates increases.
b_x = 0.5
| method | n_good_covs | typeII_min | typeII_max | typeII_mean |
|---|---|---|---|---|
| no_covs | 1 | 0.001 | 0.310 | 0.104 |
| no_covs | 2 | 0.001 | 0.307 | 0.103 |
| no_covs | 3 | 0.001 | 0.301 | 0.102 |
| no_covs | 4 | 0.001 | 0.308 | 0.103 |
| no_covs | 6 | 0.001 | 0.305 | 0.103 |
| no_covs | 8 | 0.001 | 0.305 | 0.102 |
| no_covs | 9 | 0.001 | 0.307 | 0.104 |
| no_covs | 12 | 0.001 | 0.303 | 0.103 |
| all_covs | 1 | 0.000 | 0.286 | 0.075 |
| all_covs | 2 | 0.000 | 0.275 | 0.059 |
| all_covs | 3 | 0.000 | 0.283 | 0.052 |
| all_covs | 4 | 0.000 | 0.292 | 0.051 |
| all_covs | 6 | 0.000 | 0.260 | 0.041 |
| all_covs | 8 | 0.000 | 0.270 | 0.043 |
| all_covs | 9 | 0.000 | 0.243 | 0.038 |
| all_covs | 12 | 0.000 | 0.263 | 0.039 |
| p_hacked | 1 | 0.000 | 0.222 | 0.056 |
| p_hacked | 2 | 0.000 | 0.191 | 0.037 |
| p_hacked | 3 | 0.000 | 0.170 | 0.027 |
| p_hacked | 4 | 0.000 | 0.138 | 0.020 |
| p_hacked | 6 | 0.000 | 0.129 | 0.018 |
| p_hacked | 8 | 0.000 | 0.093 | 0.013 |
| p_hacked | 9 | 0.000 | 0.109 | 0.015 |
| p_hacked | 12 | 0.000 | 0.097 | 0.013 |
| r | 1 | 0.000 | 0.281 | 0.072 |
| r | 2 | 0.000 | 0.257 | 0.056 |
| r | 3 | 0.000 | 0.251 | 0.048 |
| r | 4 | 0.000 | 0.246 | 0.044 |
| r | 6 | 0.000 | 0.235 | 0.038 |
| r | 8 | 0.000 | 0.231 | 0.036 |
| r | 9 | 0.000 | 0.223 | 0.035 |
| r | 12 | 0.000 | 0.231 | 0.035 |
| partial_r | 1 | 0.000 | 0.273 | 0.070 |
| partial_r | 2 | 0.000 | 0.250 | 0.054 |
| partial_r | 3 | 0.000 | 0.236 | 0.046 |
| partial_r | 4 | 0.000 | 0.226 | 0.041 |
| partial_r | 6 | 0.000 | 0.225 | 0.037 |
| partial_r | 8 | 0.000 | 0.221 | 0.035 |
| partial_r | 9 | 0.000 | 0.220 | 0.035 |
| partial_r | 12 | 0.000 | 0.232 | 0.035 |
| full_lm | 1 | 0.000 | 0.273 | 0.070 |
| full_lm | 2 | 0.000 | 0.254 | 0.055 |
| full_lm | 3 | 0.000 | 0.245 | 0.048 |
| full_lm | 4 | 0.000 | 0.244 | 0.044 |
| full_lm | 6 | 0.000 | 0.246 | 0.042 |
| full_lm | 8 | 0.000 | 0.244 | 0.042 |
| full_lm | 9 | 0.000 | 0.240 | 0.041 |
| full_lm | 12 | 0.000 | 0.246 | 0.042 |
| lasso | 1 | 0.000 | 0.274 | 0.071 |
| lasso | 2 | 0.000 | 0.251 | 0.055 |
| lasso | 3 | 0.000 | 0.236 | 0.047 |
| lasso | 4 | 0.000 | 0.227 | 0.042 |
| lasso | 6 | 0.000 | 0.223 | 0.037 |
| lasso | 8 | 0.000 | 0.210 | 0.034 |
| lasso | 9 | 0.000 | 0.208 | 0.033 |
| lasso | 12 | 0.000 | 0.207 | 0.031 |
In the table, we see the Type II error decreasing as the number of good covariates increases. We can get a more nuanced view in the plot below.
Similarly, we see lasso performing best for higher numbers of good covariates. For smaller numbers of good covariates, partial r and lasso perform comparably well.
by correlations
We will look at the Type II error rates of each method by the correlation between \(Y\) and the good covariates, for both \(b_x = 0.3\) and \(b_x = 0.5\).
b_x = 0.3
| method | r_ycov | r_cov | typeII_min | typeII_max | typeII_mean |
|---|---|---|---|---|---|
| no_covs | 0.3 | 0.3 | 0.149 | 0.685 | 0.418 |
| no_covs | 0.5 | 0.3 | 0.146 | 0.687 | 0.419 |
| all_covs | 0.3 | 0.3 | 0.076 | 0.674 | 0.364 |
| all_covs | 0.5 | 0.3 | 0.000 | 0.618 | 0.191 |
| r | 0.3 | 0.3 | 0.073 | 0.668 | 0.356 |
| r | 0.5 | 0.3 | 0.000 | 0.610 | 0.183 |
| partial_r | 0.3 | 0.3 | 0.073 | 0.661 | 0.352 |
| partial_r | 0.5 | 0.3 | 0.000 | 0.606 | 0.181 |
| full_lm | 0.3 | 0.3 | 0.084 | 0.660 | 0.355 |
| full_lm | 0.5 | 0.3 | 0.000 | 0.606 | 0.184 |
| lasso | 0.3 | 0.3 | 0.072 | 0.659 | 0.349 |
| lasso | 0.5 | 0.3 | 0.000 | 0.605 | 0.181 |
In the table, we see that the Type II error decreases as the correlation between \(Y\) and the good covariates increases. The Type II error is highest for including no covariates. We can visualize this below.
In the plots, we see that the no covariates method has the highest Type II error across correlation levels. The Type II errors decrease for the higher correlation between \(Y\) and the good covariates. We also see slight decreases in Type II error from including all covariates to selecting them.
b_x = 0.5
| method | r_ycov | typeII_min | typeII_max | typeII_mean |
|---|---|---|---|---|
| no_covs | 0.3 | 0.001 | 0.310 | 0.103 |
| no_covs | 0.5 | 0.001 | 0.308 | 0.103 |
| all_covs | 0.3 | 0.000 | 0.292 | 0.080 |
| all_covs | 0.5 | 0.000 | 0.203 | 0.021 |
| r | 0.3 | 0.000 | 0.281 | 0.073 |
| r | 0.5 | 0.000 | 0.191 | 0.018 |
| partial_r | 0.3 | 0.000 | 0.273 | 0.071 |
| partial_r | 0.5 | 0.000 | 0.189 | 0.018 |
| full_lm | 0.3 | 0.000 | 0.273 | 0.076 |
| full_lm | 0.5 | 0.000 | 0.188 | 0.019 |
| lasso | 0.3 | 0.000 | 0.274 | 0.070 |
| lasso | 0.5 | 0.000 | 0.190 | 0.018 |
From the table, we see that the Type II error decreases as the correlation between \(Y\) and the good covariates increases.
In the plots, we see again that the no covariates approach has the highest Type II error across correlations. There is a slight decrease in Type II error when selecting covariates instead of using all covariates. Among the selection methods, full lm has the higher Type II error, but only by a small amount. The Type II error rates are lower across all methods when the correlation between \(Y\) and the good covariates is higher.
Estimate, SD, & SE
Here we will compare, across methods, the estimate of \(b_x\), the standard deviation of the estimate, and the average standard error of the estimate. The standard deviation is calculated as the SD of the sampling distribution of the estimates. The standard error is from the linear model output. Since the mean of standard errors would be biased, we calculate the average SE by taking the square root of the mean of the squared standard errors. We compare the differences by subtracting this average linear model SE from the calculated SD.
b_x = 0.3
| method | mean_estimate | SD_estimate | mean_SE | difference |
|---|---|---|---|---|
| no_covs | 0.300 | 0.148 | 0.148 | 0.000 |
| all_covs | 0.300 | 0.124 | 0.124 | 0.000 |
| r | 0.299 | 0.121 | 0.121 | 0.000 |
| partial_r | 0.300 | 0.122 | 0.121 | 0.001 |
| full_lm | 0.300 | 0.125 | 0.122 | 0.004 |
| lasso | 0.300 | 0.123 | 0.120 | 0.002 |
Here we see that all methods correctly estimate \(b_x\) to be 0.3, except the r approach which yielded a slightly lower average estimate. The no covariates, all covariates, and r approaches show no difference between the calculated SD and the linear model SE, while partial r, full lm, and lasso approaches show slight differences.
b_x = 0.5
| method | mean_estimate | SD_estimate | mean_SE | difference |
|---|---|---|---|---|
| no_covs | 0.500 | 0.148 | 0.148 | 0.000 |
| all_covs | 0.500 | 0.124 | 0.124 | 0.000 |
| r | 0.498 | 0.121 | 0.121 | 0.000 |
| partial_r | 0.500 | 0.122 | 0.121 | 0.001 |
| full_lm | 0.500 | 0.125 | 0.122 | 0.004 |
| lasso | 0.500 | 0.123 | 0.120 | 0.002 |
Similarly, we see all methods correctly estimate \(b_x\) to be 0.5, except the r approach which again yielded a slightly lower estimate. The no covariates, all covariates, and r approaches show no difference between the calculated SD and the linear model SE, while partial r, full lm, and lasso approaches show slight differences.
Sampling Distributions
Here we view sampling distributions of the estimate for \(b_x\) for each method, for both \(b_x = 0.3\) and \(b_x = 0.5\).
b_x = 0.3
Warning: Removed 19433 rows containing non-finite outside the scale range
(`stat_density()`).
In the plot, we can see that the distributions for all methods are centered around 0.3. The no covariates approach has the widest distribution.
b_x = 0.5
Warning: Removed 19514 rows containing non-finite outside the scale range
(`stat_density()`).
In the plot, we can see that the distributions for all methods are centered around 0.5. The no covariates approach has the widest distribution.
Conclusions
We compared 7 methods for selecting covariates to include in linear models. In the first section looking at Type I errors, we demonstrated that the p-hacking approach is not a statistically valid method as it led to inflated Type I error rates and biased parameter estimates. The remaining 6 methods were all shown to be statistically valid, and can be further compared by their Type II error results. Overall, using no covariates performed the worst as it led to the highest Type II error. Including all covariates led to reductions in Type II error, and using one of the selection methods led to further reductions in Type II error. A comparison of the selection methods across different research settings, showed they yielded similar Type II errors. However, for larger numbers of covariates and larger proportions of good covariates, lasso and partial r did have the lowest Type II errors.